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Abstract. In recent years, there has been a true revival of the nonsymmetric Lanczos method. On 
the one hand, the possible breakdowns in the classical algorithm are now better understood, and so- 
called look-ahead variants of the Lanczos process have been developed, which remedy this problem. 
On the other hand, various new Lanczos-based iterative schemes for solving nonsymmetric linear 
systems have been proposed. This paper gives a survey of some of these recent developments. 


1 Introduction 

Many numerical computations involve the solution of large nonsingular systems of linear equations 

Ax = b. (1-1) 


For example, such systems arise from finite difference or finite element approximations to partial 
differential equations (PDEs), as intermediate steps in computing the solution of nonlinear prob- 
lems, or as subproblems in large-scale linear and nonlinear programming. Typically, the coefficient 
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matrix A of (1.1) is sparse and highly structured. A natural way to exploit the sparsity of A in the 
solution process is to use iterative techniques, which involve A only in the form of matrix-vector 
products. Most iterative schemes of this type fall into the category of Krylov subspace methods: 
they produce approximations x n to A~ 1 b of the form 

x n € 3-0 d" ^n( r 0> n = lj 2, . . . . (1-2) 

Here x 0 is any initial guess for A _1 6, r 0 := b - Ax 0 is the corresponding residual vector, and 

K n (r 0 ,A) : = span{r 0 , Ar 0 , . . . , A n_1 r 0 } (1.3) 

is the nth Krylov subspace generated by r 0 and A. 

The most powerful iterative method of this type is the conjugate gradient algorithm (CG) due 
to Hestenes and Stiefel [33], which is a scheme for linear systems (1.1) with Hermitian positive 
definite A. Although CG was introduced as early as 1952, its true potential was not appreciated 
until the 1970s. In 1971, Reid [45] revived interest in the method when he demonstrated its 
usefulness for solving linear systems arising from self-adjoint elliptic PDEs. Moreover, it was realized 
(see, e.g., [7]) that the performance of CG can be enhanced by combining it with preconditioning, 
and efficient preconditioners, such as the incomplete Cholesky factorization [40], were developed. 

Thereafter, the success of CG triggered an extensive search for CG-type Krylov subspace meth- 
ods for non-Hermitian linear systems, and a number of such algorithms have been proposed: we 
refer the reader to [1, 51, 48, 47, 17] and the references given there. Among the many properties 
of CG, the following two are the most important ones: its nth iterate is defined by a minimization 
property over A' n (r 0 , A), and the algorithm is based on three-term vector recurrences. Ideally, a 
CG-like method for non-Hermitian matrices would have features similar to these two. It would 
produce iterates x n in (1.2) that: 

(i) are characterized by a minimization property over /^(fq, A), such as the minimal residual 
property 

\\b - Ax n || = min ||6-Ax||, i n 6i 0 +^( r o^); 

ar€x 0 +An(ro,i4) 

(ii) can be computed with little work per iteration and low overall storage requirements. 

Unfortunately, it turns out that, for general non-Hermitian matrices, one cannot fulfill (i) and (ii) 
simultaneously. This result is due to Faber and Manteuffel [10, 11] who have shown that, except 
for a few anomalies, CG-type algorithms with (i) and (ii) exist only for matrices of the special form 

A = e ,8 (T + <rl), where T = T H , 9 6 R, <x G C, (1.4) 

(see also Voevodin [55] and Joubert and Young [35]). Note that the class (1.4) consists of just the 
shifted and rotated Hermitian matrices. We remark that the important subclass of real nonsym- 
metric matrices 

A = / — S, where S = —S T is real, (1-5) 

is contained in (1.4), with e iB = », a = -i, and T = iS. Concus and Golub [6] and Widlund [56] 
were the first to devise a CG-type algorithm for the family (1.5). 

Most of the non-Hermitian Krylov subspace methods that have been proposed satisfy either 
(i) or (ii). Until recently, the emphasis was on requirement (i), and numerous algorithms with 


Advances in Lanczos-based Methods for Linear Systems 


3 


iterates characterized by (i) or a similar condition have been developed, starting with Vinsome’s 
Orthomin [54]. The most widely used method in this class is the generalized minimal residual 
algorithm (GMRES) due to Saad and Schultz [49]. Of course, none of these methods fulfills (u), 
and indeed, for all these algorithms work per iteration and overall storage requirements grow linearly 
with the iteration number n. Consequently, in practice one cannot afford to run the full version 
of these algorithms, and it is necessary to use restarts. For difficult problems, this often results in 
very slow convergence. 

The second category of CG-like non-Hermitian Krylov subspace methods consists of schemes 
that satisfy (ii), but not (i). The archetype in this class is the classical biconjugate gradient algorithm 
(BCG), which was proposed by Lanczos [38] already in 1952 and later revived by Fletcher [12] in 
1976. Since no minimization condition of type (i) holds for BCG, the algorithm can exhibit and 
typically does — a rather irregular convergence behavior with wild oscillations in the residual norm. 
Even worse, breakdowns in the form of division by 0 may be encountered during the iteration 
process. In finite precision arithmetic, such exact breakdowns are very unlikely; however, near- 
breakdowns may occur, leading to numerical instabilities in subsequent iterations. 

The BCG method is intimately connected with the nonsymmetric Lanczos process [37] for 
tridiagonalizing square matrices. In particular, the Lanczos algorithm in its original form is also 
susceptible to breakdowns and potential numerical instabilities. In recent years, there has been 
a true revival of the nonsymmetric Lanczos process. On the one hand, the possible breakdowns 
in the classical algorithm are now better understood, and so-called look-ahead variants of the 
Lanczos process have been developed, which remedy this problem. On the other hand, various new 
Lanczos-based Krylov subspace methods for solving general non-Hermitian linear systems have 
been proposed. Here we review some of these recent developments. 

The remainder of the paper is organized as follows. In Section 2, we focus on the nonsymmetric 
Lanczos process; in particular, we sketch a look-ahead variant of the method and briefly discuss 
related work. We then turn to Lanczos-based Krylov subspace algorithms for non-Hermitian linear 
systems. First, in Section 3, we consider the recently proposed quasi-minimal residual method 
(QMR) and outline two implementations. In addition to matrix-vector products with the coefficient 
matrix A of (1.1), BCG and QMR also require multiplications with its transpose A T . This is a 
disadvantage for certain applications where A T is not readily available. It is possible to devise 
Lanczos-based methods that do not involve A T , and in Section 4, we survey some of these so-called 
transpose-free schemes. In Section 5, we make some concluding remarks. 

Throughout the paper, all vectors and matrices are allowed to have real or complex entries. 
As usual, M t and M H denote the transpose and conjugate transpose of a matrix M, respectively. 
The vector norm ||z|| = \/x H x is always the Euclidean norm. The notation 

V n = {^(A) s < 7 0 + <T 1 X + ••• + <7 n A n | cr 0 ,...,<T n € C} 

is used for the set of all complex polynomials of degree at most n. Finally, A is always assumed to 
be a square matrix of order N. 


2 The Nonsymmetric Lanczos Process 

In this section, we consider the nonsymmetric Lanczos process. Here the matrix A is not required 
to be nonsingular. 
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2.1 A Look-Ahead Lanczos Algorithm 

The Lanczos method in its original form as proposed by Lanczos [37] can break down prematurely. 
Taylor [52] and Parlett, Taylor, and Liu [44] — with their look-ahead Lanczos algorithm — were the 
first to devise a variant of the classical process that skips over possible breakdowns. We use the term 
look-ahead Lanczos method in a broader sense to denote any extension of the standard algorithm 
that circumvents breakdowns. In this section, we sketch an implementation of a look-ahead Lanczos 
algorithm that was recently developed by Freund, Gutknecht, and Nachtigal [18]. 

Given two nonzero starting vectors f ^ E and tiq € C^, the look-ahead Lanczos process 
generates two sequences of vectors and { u ’j } j*= l su< -h that, for n = 1,2, . . ., 

span^.vj,...,^} = K n (v t ,A), (2.1) 

span{t/J 1 ,u; 2 ,...,u; n } = K n (w u A T ). 

Here, K n (v l ,A) and K n (w l ,A T ) denote the nth Krylov subspace of C' v generated by and A, 
and w x and A T , respectively (cf. (1.3)). Moreover, the Lanczos vectors are constructed so that the 
block biorthogonality relation 

(WWfvW = ( = j,k = 1, . . . , (2.2) 

L o if j # k, 

holds. Here, the matrices V and contain the Lanczos vectors built during the fcth look-ahead 
step. More precisely, 

= V+1 ••• v + 1 -i], it = i, — i, 

w (k) = [tu nk tu njk+1 m„* +1 -i], 

and 

V {1) = K, u n,+l "• 
w (,) = [tu n( Ui n|+1 ••• W„], 

where 

1 = n x < n 2 < • • ■ < n k < ■ ■ ■ < n t < n < n l+1 . 

The first vectors v and w nk in each block are called regular, and any remaining vectors are called 
inner. Note that l = l(n) denotes the index of the last constructed regular vector. Furthermore, in 
(2.2), the blocks D (k) are nonsingular for k = 1, ...,/- 1, and D W is nonsingular if n = n /+1 - 1. 
With these preliminaries, the look-ahead Lanczos algorithm can be sketched as follows. 

Algorithm 2.1 (Sketch of the look-ahead Lanczos process) 

0) Choose nonzero vectors v x , w 1 £ C N . 

Set 0 1 ) = v lt = w lt DW = (WM) T V {1] . 

Set n k = 1, l = 1, v 0 = w 0 = 0, V 0 = W Q = 0. 

For n = 1,2, . . ., do : 

1) Decide whether to construct u n+1 and tu n+1 as regular or inner vectors 
and go to 2) or 3), respectively. 
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2) ( Regular step.) Compute 

p n = (DW)- l (WW) T A t> n . 

I/ n = (D( , - 1 ))- 1 (W (,_1) ) r > lv n. 

Vi = - v®/*. - 

Set „ l+l = n + 1, 1 = f + 1, V 10 = W'" 1 = t. onrf S« 10 4 >- 

3) ( Inner step.) Compute 

u n = Av n , 

U n+1 = Av n - Cn v n ~ Vn V n-l ~ ^ 

W n+1 = A T U! n - C n W n ~ Vn W n-l ~ W^ ( 1 _ 1 ) ^n- 

4) Ifv n+ 1 = 0 or u? n+1 = 0, stop. Otherwise, set 

y(') = [ y(') Vn+1 ] , w (,) = [ ww w n+l ] , 
D (‘) = (wV)) T v {l \ 
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(2.3) 


(2.4) 


In [18], it is shown how one can implement Algorithm 2.1 so that only two inner products are 
computed at every step, for either p n and v n in (2.3), or for v n in (2.4). The cruci P ar ‘ ° 
Algorithm 2.1 is the look-ahead strategy used in step 1). As described in [18], the decision m 1) is 
baled on three checks. For a regular step, it is necessary that Df > be nonsingular. Therefore 
of the checks monitors the size of smallest singular value of . The other two checks ai p 

to ensure the Unear independence of the Lanczos vectors. The i algorithm momtors the «ze of _1 
components p n and i/ n along the two previous blocks V«> and respectively and 

in (2 P .3), and performs a regular step only if these terms do not dominate the components Av n an 
A T Wn i n t he new Krylov spaces. Complete details of the implementation of the look-ahead Lanczos 

Algorithm 2.1 are given in [18]. . ... , _ n n _„ 

We note that, in (2.4), C„ and r? n are arbitrary inner recurrence coefficients, with C n , - 0. 

possibiUty is to choose the Chebyshev iteration [25, 39] parameters for C n and rj n . However, since 
the length of look-ahead steps is usually small, the choice of the inner recurrence c^fficmnts is not 
crucial; in our experience, C n = 1 and, if » # »*, Vn = U ™ ks satisfactorily. Indeed, with the look- 
ahead strategy proposed in [18], the algorithm performs mostly regular steps, and typicaHy, ° n 
a few look-ahead steps of length bigger than 1 occur. In our experiments, the longest look-ahead 

we encountered was of length 4. , 

For later use, we remark that the recurrences in (2.3) and (2.4) can be written compactly in 

matrix form. For example, for the right Lanczos vectors t>„, we have 


AV n = V n+1 H n , 


(2.5) 


where 


:= [«! 
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c*i /? 2 0 • • • 0 

72 Q 2 *'• : 

0 ••• 0 

: ’ ’ • ’ ' • ’ • 0i 

: 7 i «i 

0 0 7/+i 

is a block tridiagonal matrix. The diagonal blocks a k are square unreduced upper Hessenberg 
matrices, whose size is equal to the number of vectors in the corresponding block V ( k K The 
matrices 7 ^ have only one nonzero element, in their upper right corner, and thus H n is an upper 
Hessenberg matrix, with full rank 

rank5’ n = n. (2-6) 

If only regular steps 2) are performed, then the Algorithm 2.1 reduces to the classical Lanc- 
zos process. In this case, the blocks and W consist of just the single vector v k and w k , 
respectively, and the orthogonality relations (2.2) now read: 




T ( ^ 0 if j — k, 

^ = lo if j # k, 


j,k = 1, ...» n. 


Moreover, H n is just a scalar tridiagonal matrix. The condition 6 k ^ 0 in (2.7) is crucial, since 
each step of the classical Lanczos algorithm involves a division by 6 k . The point is that one cannot 
guarantee 6 k / 0, and in fact, when S k = 0 with v k # 0 and w k / 0, the algorithm breaks down. 
Note that 6 k as 0 signals a near-breakdown of the procedure. 

Algorithm 2.1 will handle exact and near- breakdowns in the classical Lanczos process, except 
for the special event of an incurable breakdown [52]. These axe situations when the look-ahead 
procedure would build an infinite block, without ever finding a nonsingular D Taylor [52] has 
shown in his Mismatch Theorem that in case of an incurable breakdown, one can still recover 
eigenvalue information. For linear systems, an incurable breakdown would require restarting the 
procedure with a different choice of starting vectors. Fortunately, in practice round-off errors will 
make an incurable breakdown highly unlikely. 

Finally, we remark that, for the important class of p-cyclic matrices A, exact breakdowns in 
the Lanczos process occur in a regular pattern. In this case, as was shown by Freund, Golub, 
and Hochbruck [16], look-ahead steps are absolutely necessary if one wants to exploit the p-cyclic 
structure. For details of a look-ahead Lanczos algorithm for p-cyclic matrices, we refer the reader 
to [16]. 


2.2 Historical Remarks and Related Work 

The problem of breakdowns in the classical Lanczos algorithm has been known from the beginning. 
Although a rare event in practice, the possibility of breakdowns has certainly brought the method 
into discredit and has prevented many people from actually using the algorithm. On the other 
hand, as was demonstrated by Cullum and Willoughby [8], the Lanczos process — even without 
look- ahead — is a powerful tool for sparse matrix computation. 

The Lanczos method has intimate connections with many other areas of Mathematics, such as 
form all y orthogonal polynomials (FOPs), Pade approximation, Hankel matrices, control theory, and 
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coding theory. The problem of breakdowns has a corresponding formulation in all of these areas, 
and remedies for breakdowns in these different settings have been known for quite some time. 
For example, the breakdown in the Lanczos process is equivalent to a breakdown of the generic 
three-term recurrence relation for FOPs, and it is well known how to overcome such breakdowns 
by modifying the recursions for FOPs (see [26, 9, 31] and the references given there). In the 
context of the partial realization problem in control theory, remedies for breakdowns were given 
in [36, 27]. The Lanczos process is also closely related to fast algorithms for the factorization of 
Hankel matrices, and again it was known how to overcome possible breakdowns of these algorithms 
(see [32, 22] and the references therein). However, in all these cases, only the problem of exact 
breakdowns was addressed. 

The look-ahead Lanczos algorithm of Taylor [52] and Parlett, Taylor, and Liu [44] was the first 
procedure that remedies both exact and near-breakdowns. We point out that their implementation 
is different from Algorithm 2.1. In particular, it always requires more work per step than Algo- 
rithm 2.1, and it does not reduce to the classical Lanczos process in the absence of look-ahead 
steps. Furthermore, in [52, 44], details are given only for the case of look-ahead steps of size 2, and 
their algorithm does not generalize easily to blocks of more than two vectors. 

In recent years, there has been a revival of the nonsymmetric Lanczos algorithm, and since 1990, 
in addition to the papers we have already cited in this section, there are several others dealing with 
various aspects of the Lanczos process. We refer the reader to [2, 3, 4, 22, 29, 34, 43] and the 
references given therein. 


3 The Quasi-Minimal Residual Approach 

We now return to linear systems (1.1). From now on, it is always assumed that the matrix A is 
nonsingular. In this section, we describe the QMR method. The procedure was first proposed by 
Freund [13, 15] for the case of complex symmetric matrices A = A T , and then extended by Freund 
and Nachtigal [19] for the case of general non-Hermitian matrices. 

3.1 The Standard QMR Algorithm 

Recall that the nth iterate of any Krylov subspace method is of the form (1-2). If now we choose 

tq = r 0 (3- 1 ) 

in Algorithm 2.1, then, by (2.1), the Lanczos vectors v 1 ,...,v n span £ n (r 0 ,A); hence we can write 

I n = X 0 d" ' 

for some z„ € C n . Together with (3.1) and (2.5), this gives the corresponding residual vector 

r„ = r 0 - AV n z n = V n+1 (e 1 - H n z n ), (3.2) 

where ei denotes the first unit vector in R n+1 . As V n+1 is not unitary, it is not possible to minimize 
the Euclidean norm of the residual without expending 0(Nn 2 ) work and O(Nn) storage. Instead, 
one min imizes just some weighted Euclidean norm of the coefficient vector in (3.2). More precisely, 
let 

= diagK,^,...,^^), Uj > 0, j = l,...,n + 1, 


(3.3) 
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be a weighting matrix. Then z n € C" is chosen as the solution of the least squares problem 


W u \ € \ - &nH n Z n \\ = IK e l - 

z6 c 


(3.4) 


Note that, in view of (2.6) and (3.3), the problem (3.4) always has a unique solution. Usually, the 
weights in (3.4) are chosen as = Ht^H, which means that all components in 

r n = (^n+l^n 1 ) (^1 e l — ^n^n z n) 


are treated equally. 

The least-squares problem (3.4) can be solved by standard techniques based on a QR decompo- 
sition off l n H n . One computes a unitary matrix Q n e d n+1)x(n+1) and an upper triangular matrix 
R n g C nXn such that 

QAH n = ["”], (3.5) 

and then obtains z n from 

z n = R-\, t n =^[/ n 0]Q n ej, (3.6) 

which gives 

x n ^x 0 + V n R-h n . (3.7) 

This gives the following QMR algorithm. 

Algorithm 3.1 (QMR algorithm) 

0) Choose x 0 E C N and set = r 0 = b - Ax 0 . 

Choose w t € with w x ^ 0. 

For n= 1, 2, ... , do : 

1) Perform the nth iteration of the look-ahead Lanczos Algorithm 2.1. 

This yields matrices V n , V n+1 , and H n which satisfy (2.5). 

2) Update the QR factorization (3.5) of H n and the vector t n in (3.6). 

3) Compute x n from (3.7). If x n has converged, stop. 

We note that Q n in (3.5) is just a product of n Givens rotations, and thus the vector t n is easily 
updated in step 2). Also, as H n is block tridiagonal, R n also has a block structure that is used in 
step 3) to update x n using only short recurrences. For complete details, see [19]. 

The quasi-minimization (3.4) is strong enough to obtain convergence results for QMR. One can 
derive error bounds for QMR that are comparable to those for GMRES. Also, it is possible to relate 
the norms of the QMR and GMRES residual vectors. This is in contrast to BCG and methods 
derived from BCG, for which no such convergence results are known. Finally, if desired, one can 
recover BCG iterates from the QMR Algorithm 3.1, at the expense of only one additional SAXPY 
per step. For these and other properties of QMR, we refer the reader to [19, 41]. 

Algorithm 3.1 is only one possible implementation of the QMR method. Instead of using three- 
term recurrences as in the underlying look-ahead Lanczos Algorithm 2.1, the basis vectors {u n } and 
^ W n } can also be generated by coupled two-term recurrences. Empirical observations indicate that. 
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in finite precision arithmetic, the latter approach is more robust than the former. Details of such 
an implementation of the QMR method based on coupled two-term recurrences with look-ahead 

are presented in [20]. T 

FORTRAN 77 implementations of the QMR Algorithm 3.1 and of the look-ahead Lanczos 

Algorithm 2.1 are available electronically from netlib. 1 


3.2 BCG and an Implementation of QMR without Look-Ahead 

We now look at BCG in more detail. The BCG algorithm attempts to generate iterates x® CG that 
are characterized by the Galerkin condition 

x„ CG € x 0 + K n (r 0 , A) and w T (b - Ax® CG ) = 0 for all w G K n (w 1 , A T ). (3.8) 

Unfortunately, such iterates need not exist for every n, and this is one source of possible breakdowns 

in BCG. . , , 

As noted already, BCG is closely related to the classical Lanczos process. More precisely, the 

BCG residual vectors are just scalar multiples of the right Lanczos vectors: 

r n = 6 — Ax® cg = 0 n v n+1 , 6 n G C, 0 n #O. (3-9) 

In addition to r n , the BCG algorithm also involves a second sequence of vectors r n 6 A„+i( r o> A T ). 
Here r 0 6 C N is an arbitrary nonzero starting vector; usually one sets f 0 = r 0 or chooses r 0 as a 
vector with random coefficients. The vectors f n are connected with the left vectors generated by 
the classical Lanczos process: 


= 0 r 


n w n+U 


K G C, 0 n #O. 


( 3 . 10 ) 


From (3.9) and (3.10), we have 


rl-Sn-l 


(3.11) 


Recall from (2.7) that the classical Lanczos process breaks down if w^v n - 0 with v n ^ 0 and 
w n ^ 0. In view of (3.11), this is equivalent to 

f n-l r n-l ~ ^# 0 . # 0 - 

As Algorithm 3.2 below shows, BCG also breaks down if (3.12) occurs. In addition to (3.12), there 
is a second source of breakdowns in BCG, namely 

ql-lMn-l = Of ?n— 1 ^ 9n-l ^ 0- ( 3 - 13 ) 

Here q n _ 1 and q n _ x are the vectors generated by Algorithm 3.2 below. It can be shewn (see, 
e.g., [46]) that a breakdown of the kind (3.13) occurs if, and only if, no Galerkin iterate x n with 

(3.8) exists. . 

Unlike the BCG iterates, the QMR iterates are always well defined by (2.6). In particular, 
breakdowns of the kind (3.13) can be excluded in the QMR Algorithm 3.1. We stress that this 

"To obtain the codes, one needs to send a message consisting of the single line “send lalqmr from misc" to 
netlib@ornl.gov . 
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remains true even if, in the QMR Algorithm 3.1, one uses the classical Lanczos process in step 1). 
Of course, the use of the look-ahead Lanczos Algorithm 2.1 avoids breakdowns of the first kind 
(3.12), except for incurable breakdowns. 

As already noted, existing BCG iterates can be easily obtained from quantities generated by 
the QMR Algorithm 2.1. Therefore, QMR can also be viewed as a stable implementation of BCG. 
It is also possible to reverse the roles of the two algorithms and to get QMR iterates directly from 
the BCG algorithm. Such an implementation of QMR without look-ahead was derived by Freund 
and Szeto in [21], and is as follows. 

Algorithm 3.2 (QMR without look-ahead from BCG) 

0) Choose x 0 E C N and set Xq MR = x^ CG = x 0 . 

Set q Q = r 0 = b- Ax 0 , p 0 = 0, r 0 = (^i||r 0 ||, = 0. 

Choose f 0 e C N , f 0 ^ 0, and set q 0 = f 0 , p 0 = f^r 0 . 

For n = 1,2,..., do : 

1) Set cr n _! = 

If cr n _ 1 = 0, stop. Otherwise, compute 

a n-l = Pn—l/^n—1 > 

r n = r n-l “ <*n-lMn-l> 

?n = fn - 1 ~ 


If BCG iterates are desired, set 


-BCG _ _BCG 
*71 - x n - 1 


+ a n-l9n-l- 


2) Compute 


I { ' 

" ” T n-\ 


Cn 


> X n = T n-l^n^ 


Pn = «n^n Pn-1 + c n a n-l9„-l , 


Pn = c n 
I QMR = I Q_MR + - n . 

3) If p n _i = 0, stop. Otherwise, compute 


Pn = f Zr n , 0 n = p n l Pn—\i 

9n = x n "h /^n9n— 1' 

9n = "h PnQn— 1‘ 


We remark that, exact for the additional update in step 2), this algorithm is just the classical 
BCG. Of course, unlike the QMR Algorithm 3.1, the implementation of QMR in Algorithm 3.2 can 
breakdown due to (3.12) and (3.13). 

Algorithm 3.2 is only one of several possible implementations of the BCG approach; see [34, 2S] 
for an overview of the different BCG variants. As in the nonsymmetric Lanczos process, exact 
and near- breakdowns in the BCG methods can be avoided by incorporating look-ahead procedures. 
Such look-ahead BCG algorithms have been proposed by Joubert [34] and Gutknecht [29]. 
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4 Transpose- Free Methods 


Krylov subspace methods such as BCG and QMR, which are based directly on the Lanczos process, 
involve matrix-vector products with A and A T . This is a disadvantage for certain applica ions, 
where A T is not readily available. It is possible to devise Lanczos-based Krylov subspace methods 
that do not involve the transpose of A. In this section, we give an overview of such transpose-free 

First, we consider the QMR algorithm. As pointed out by Freund and Zha [23], in principle 
it is always possible to eliminate A T altogether, by choosing the starting vector w t suitably. This 
observation is based on the fact that any square matrix is similar to its transpose. In particular, 
there always exists a nonsingular matrix P such that 


A T P ~ PA. 


(4.1) 


Now suppose that in the QMR Algorithm 3.1 we choose the special starting vector Wl - Pv } . Then, 
with (4.1), one readily verifies that the vectors generated by look-ahead Lanczos Algorithm . 

^ - for all n. (4-2) 


w n = Pv n 


Hence, instead of updating the left Lanczos vectors {w n } by means of the recursions in (2.3) or (2.4), 
thev can be computed directly from (4.2). The resulting QMR algorithm no longer involves the 
transpose of A; in exchange, it requires one matrix- vector multiplication with P m each iteration 
step Therefore, this approach is only viable for special classes of matrices A, for which one can 
find a matrix P satisfying (4.1) easily, and for which, at the same time, matrix-vector products 
with P can be computed cheaply. The most trivial case are real or complex symmetric matrices 
A = A t , which fulfill (4.1) with P = I. Another simple case are Toeplitz matrices A, i.e.. matrices 
whose entries are constant along each diagonal. Toeplitz matrices satisfy (4.1) with P = J, where 


J = 


0 

1 


0 

Li o 


1 

0 

oj 


is the N x N antidiagonal identity matrix. Finally, the condition (4.1) is also fulfilled for matrices 
of the form 

A = TM~\ P = M , 

where T and M are real symmetric matrices and M is nonsingular. Matrices of this type arise when 
real symmetric linear systems Tx = b are preconditioned by M. The resulting QMR algorithm for 
the solution of preconditioned symmetric Unear system has the same work and storage requirements 
as preconditioned SYMMLQ or MINRES [42]. However, the QMR approach ; s m °" e gene " a1 ’ “ 
that it can be combined with any nonsingular symmetric preconditioner M while SYMMLQ and 
MINRES require M to be positive definite (see, e.g., [24]). For strongly indefinite matrices T, the 
use of indefinite preconditioners M typically leads to considerably faster convergence; see [23] for 

nUm N”xt^we a ”rn ! to transpose- free variants of the BCG method. Sonneveld [50] with his CGS 
algorithm was the first to devise a transpose-free BCG-type scheme. Note that, in the 
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Algorithm 3.2, the matrix A T appears merely in the update formulas for the vectors f„ and q n . 
On the other hand, these vectors are then used only for the computation of the vector products 
Pn = fn r n and &n = <in A( ln- Sonneveld observed that, by rewriting these products, the transpose 
can be eliminated from the formulas, while at the same time one obtains iterates 

•®2n € "b ^'2n(^'o» ^)* fl — 1,2, . . ., ( 4 *^) 

that are contained in a Krylov subspace of twice the dimension, as compared to BCG. First, we 
consider p n . From Algorithm 3.2 it is obvious that 

r n = ^n( A ) r 0 and f n = ( 4 - 4 ) 

where V» n is the nth residual polynomials of the BCG process. With (4.4), one obtains the identity 

Pn = f $ (1>n( A )) 2 r o* ( 4 - 5 ) 

which shows that p n can be computed without using A T . Similarly, 

q n = <p n (A)r 0 and q n = <p n (A T )f 0 , 
for some polynomial ip n £ P n , and hence 

° n = tfA{<p n (A)) 2 r 0 . (4.6) 


By rewriting the vector recursions in Algorithm 3.2 in terms of rj} n and <p n and by squaring the 
resulting polynomial relations, Sonneveld showed that the vectors in (4.5) and (4.6) can be up- 
dated by means of short recursions. Furthermore, the actual iterates (4.3) generated by CGS are 
characterized by 

rf n GS = b - j4x 2n = (rf CG (A)) 2 r 0 . (4.7) 

Hence the CGS residual polynomials = (v>® CG ) are just the squared BCG polynomials. As 

pointed out earlier, BCG typically exhibits a rather erratic convergence behavior. As is clear from 
(4.7), these effects are magnified in CGS, and CGS typically accelerates convergence as well as 
divergence of BCG. Moreover, there are cases for which CGS diverges, while BCG still converges. 

For this reason, more smoothly converging variants of CGS have been sought. Van der Vorst [53] 
was the first to propose such a method. His Bi-CGSTAB again produces iterates of the form (4.3), 
but instead of squaring the BCG polynomials as in (4.7), the residual vector is now of the form 

r 2n = 4’n CG ( A )Xn( A ) r 0- 


Here x n € wit ^ X n (0) = 1, is a polynomial that is updated from step to step by adding a new 
linear factor: 

Xn (A) = (l- I7n A) Xn _ 1 (A). (4.8) 

The free parameter r/ n in (4.8) is determined by a local steepest descent step, i.e., T) n is the optimal 
solution of 

min ||(7 -7 ? A) Xn -i(^ CG (^) r oll- 
veC 
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Due to the steepest descent steps, Bi-CGSTAB typically has much smoother convergence behavior 
than BCG or CGS. However, the norms of the Bi-CGSTAB residuals may still oscillate considerably 
for difficult problems. Finally, Gutknecht [30] has noted that, for real A, the polynomials Xn ^ 
always have real roots only, even if A has complex eigenvalues. He proposed a variant of Bi- 
CGSTAB with polynomials (4.8) that are updated by quadratic factors in each step and thus can 
have complex roots in general. 

In the CGS algorithm, the iterates (4.3) are updated by means of a formula of the form 


= Z 2 (n-1) + a n-\{V2n-\ + Din)- 


(4.9) 


Here the vectors y lt y 2 , . . . , y 2n satisfy 

span{jq, y 2 , • • •> = ^m( r o*^)> m= 1,2, 

In other words, in each iteration of the CGS algorithm two search directions y 2n _i and j/ 2n are 
available, while the actual iterate is updated by the one-dimensional step (4.9) only. Based on 
this observation, Freund [14] has proposed a variant of CGS that makes use of all available search 
directions. More precisely, instead of one iterate x£ n GS per step it produces two iterates x 2n _! and 
x 2n of the form 

J, -• !U*m. < 4 - 10 > 

Furthermore, the free parameter vector z m in (4.10) can be chosen such that the iterates satisfy 
a quasi-minimal residual condition, similar to the quasi- minimization property of the QMR Al- 
gorithm 3.4. For this reason, the resulting scheme is called transpose-free quasi-minimal residual 
algorithm (TFQMR). For details, we refer the reader to [14], where the following implementation 
of TFQMR is derived. 


Algorithm 4.1 (TFQMR algorithm) 

0) Choose x 0 G C N . 

Set w x = y 1 = r 0 = b - Ax 0 , v 0 = Ay x , d 0 = 0. 
Set r 0 = ||r 0 ||, t? 0 = 0, % = 0. 

Choose f 0 G C N , f 0 ^ 0, and set p 0 = fjr 0 . 
For n = 1,2, . . ., do : 

1) Compute 

^-1 = ^-1. 

a n-l = 


V2n = V2n-l ~ 


2) For ttx = 2rt — 1, 2n do : 


Compute 

w m+l = W m~ O n _iAy m , 

t?m = ll«\n+lll/ r m-l» C m = 1/ \A + 

T m = T m-\^m c mi Vm ~ C m a n-1’ 

d m = Vm + (^m-l^m-l/ Q n-l)^m-li 
x m ~ x m — 1 "I" Vm^m- 
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If x m has converged, stop. 

3) Compute 

Pn = f O w 2n+l> 
fin Pn! Pn— 1> 

J^n+l = ^271+1 + finVini 

v n = A V2n+l + fin( A V2n + fin v n-l)- 

We would like to point out that the iterates generated by the QMR Algorithm 3.1 and the 
TFQMR Algorithm 4.1 are different in general. 

Another transpose-free QMR method was proposed by Chan, de Pillis, and Van der Vorst [5]. 
Their scheme is mathematically equivalent to the QMR Algorithm 3.1, when the latter is based on 
the classical Lanczos process without look-ahead. The method first uses a transpose-free squared 
version of the Lanczos algorithm (see, e.g., Gutknecht [28]) to generate the scalar tridiagonal 
Lanczos matrix H n . The right Lanczos vectors v n are then computed by running the standard 
Lanczos recurrence, and finally the QMR iterates are obtained as in Algorithm 3.1. Freund and 
Szeto [21] have derived yet another transpose-free QMR scheme, which is modeled after CGS and 
is based on squaring the residual polynomials of the standard QMR Algorithm 3.1. However, the 
algorithm given in [5] and the squared QMR approach both require three matrix-vector products 
with A at each iteration, and hence they are more expensive than CGS, Bi-CGSTAB, or TFQMR, 
which involve only two such products per step. 

Finally, we remark that none of the transpose-free methods considered in this section, except for 
Freund and Zha’s simplified QMR algorithm based on (4.1), addresses the problem of breakdowns. 
Indeed, in exact arithmetic, all these schemes break down every time a breakdown occurs in the 
BCG Algorithm 3.2. Practical look-ahead techniques for avoiding exact and near- breakdowns in 
these transpose-free methods still have to be developed. 

5 Concluding Remarks 

In this paper, we have covered only some of the recent advances in iterative methods for non- 
Hermitian linear systems. A more extensive survey of recent developments in this field can be 
found in [17]. 

The introduction of CGS in the 1980s spurred renewed interest in the nonsymmetric Lanczos 
algorithm, with most of the effort directed towards obtaining a method with better convergence 
properties than BCG or CGS. Several BCG-based algorithms were proposed, such as Bi-CGSTAB, 
introduced by Van der Vorst [53]. The quasi-minimal residual technique was introduced by Fre- 
und [13, 15] in the context of complex symmetric systems, then later coupled with a new variant of 
the look-ahead Lanczos approach to obtain a general non-Hermitian QMR algorithm [19]. Finally, 
several transpose-free algorithms based on QMR have been introduced recently, which trade the 
multiplication by A T for one or more multiplications by A. However, their convergence properties 
are not well understood, and none of these algorithms have been combined with look-ahead tech- 
niques yet. In general, it seems that the transpose-free methods have more numerical problems than 
the corresponding methods that use A T , and more research is needed into studying their behavior. 

Finally, even though the field of iterative methods has made great progress in the last few years, 
it is still in its infancy, especially with regard to the packaged software available. Whereas there are 
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well-established robust general-purpose solvers based on direct methods, the same cannot be said 
about solvers based on iterative methods. There are no established iterative packages of the same 
robustness and wide acceptance as, for example, the LINPACK library, and as a result many of the 
scientists who use iterative methods write their own specialized solvers. We feel that this situation 
needs to change, and we would like to encourage researchers to provide code for their methods. 
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